Structure preserving database encryption method and system

ABSTRACT

A database encryption system and method, the Structure Preserving Database Encryption (SPDE), is presented. In the SPDE method, each database cell is encrypted with its unique position. The SPDE method permits to convert a conventional database index into a secure one, so that the time complexity of all queries is maintained. No one with access to the encrypted database can learn anything about its content without the encryption key. Also a secure index for an encrypted database is provided. Furthermore, secure database indexing system and method are described, providing protection against information leakage and unauthorized modifications by using encryption, dummy values and pooling, and supporting discretionary access control in a multi-user environment.

FIELD OF THE INVENTION

The present invention relates to database encryption and moreparticularly, the invention relates to structure preserving databaseencryption method and system, wherein no one with access to theencrypted database can learn anything about its content without theencryption key.

DEFINITIONS, ACRONYMS AND ABBREVIATIONS

Throughout this specification, the following definitions are employed:

AES: Short for Advanced Encryption Standard, a symmetric 128-bit blockdata encryption technique. AES is a block cipher adopted as anencryption standard and is expected to be used worldwide and analysedextensively, as was the case with its predecessor, the Data EncryptionStandard (DES). AES has a fixed block size of 128 bits and a key size of128, 192 or 256 bits with key and block sizes in any multiple of 32bits, with a minimum of 128 bits and a maximum of 256 bits.

B-Tree: are tree data structures that are most commonly found indatabases and filesystems. B-trees keep data sorted and allow amortizedlogarithmic time insertions and deletions. B-trees generally grow fromthe bottom up as elements are inserted, whereas most binary trees growdown. B-trees have substantial advantages over alternativeimplementations when node access times far exceed access times withinnodes. This usually occurs when most nodes are in secondary storage suchas hard drives.

B+-Tree: is a type of tree data structure. It represents sorted data ina way that allows for efficient insertion and removal of elements. It isa dynamic, multilevel index with maximum and minimum bounds on thenumber of keys in each node. A B+-Tree is a variation on a B-Tree. In aB+-Tree, in contrast to a B-Tree, all data are saved in the leaves.Internal nodes contain only keys and tree pointers. All leaves are atthe same lowest level. Leaf nodes are also linked together as a linkedlist to make range queries easy. The maximum number of keys in a recordis called the order of the B+-Tree. The minimum number of keys perrecord is ½ of the maximum number of keys. For example, if the order ofa B+-Tree is n, each node (except for the root) must have between n/2and n keys. The number of keys that may be indexed using a B+-Tree is afunction of the order of the tree and its height.

Cipher: A cipher (also spelt cypher) is an algorithm for performingencryption (and the reverse, decryption)—a series of well-defined stepsthat can be followed as a procedure. An alternative term isencipherment. The original information is known as plaintext, and theencrypted form as ciphertext.

Ciphertext: The ciphertext message contains all the information of theplaintext message, but is not in a format readable by a human orcomputer without the proper mechanism to decrypt it; it should resemblerandom gibberish to those not intended to read it.

DAC: Short for Discretionary Access Control. DAC defines basic accesscontrol policies to objects in a file system. Generally, these are doneat the discretion of the object owner—file/directory permissions anduser/group ownership. DAC is a means of restricting access to objectsbased on the identity and need-to-know of users and/or groups to whichthe object belongs. Controls are discretionary in the sense that asubject with a certain access permission is capable of passing thatpermission (directly or indirectly) to any other subject.

DBMS: A Database Management System (DBMS) is a system, usually automatedand computerized, for the management of any collection of compatible,and ideally normalized, data. A database management system (DBMS) isactually a computer program (or more typically, a suite of them)designed to manage a database, a large set of structured data, and runoperations on the data requested by numerous users. Typical examples ofDBMS use include accounting, human resources and customer supportsystems. Originally found only in large companies with the computerhardware needed to support large data sets, DBMSs have more recentlyemerged as a fairly standard part of any company back office. DBMS's arefound at the heart of most database applications.

DBA: Database Administrator (DBA) is a person who is responsible for theenvironmental aspects of a database. The duties of a databaseadministrator at a particular site vary, depending on the policies inplace and the database management system's (DBMS's) capabilities forcarrying them out. They nearly always include disaster recovery (backupsand testing of backups), performance analysis, and some database designor assistance thereof.

DES: The Data Encryption Standard (DES) is the archetypal block cipher(a method for encrypting information)—an algorithm that takes afixed-length string of plaintext bits and transforms it through a seriesof complicated operations into another ciphertext bitstring of the samelength. In the case of DES, the block size is 64 bits. DES also uses akey to customize the transformation, so that decryption can only beperformed by those who know the particular key used to encrypt. The keyostensibly consists of 64 bits; however, only 56 of these are actuallyused by the algorithm. Eight bits are used solely for checking parity,and are thereafter discarded. Hence the effective key length is 56 bits,and it is usually quoted as such.

DML: Short for Data Manipulation Language. DML is a family of computerlanguages used by computer programs or database users to retrieve,insert, delete and update data in a database. The currently most popularData manipulation language is that of SQL, which is used to retrieve andmanipulate data. Data manipulation languages were initially only used bycomputer programs, but (with the advent of SQL) have come to be used bypeople, as well. Data manipulation languages have their functionalcapability organized by the initial word in a statement, which is almostalways a verb. In the case of SQL, these verbs are “select”, “insert”,“update”, and “delete”. This makes the nature of the language into a setof imperative statements (commands) to the database. Data manipulationlanguages tend to have many different “flavors” and capabilities betweendatabase vendors.

Hash Function: is a function that converts an input from a (typically)large domain into an output in a (typically) smaller range (the hashvalue, often a subset of the integers). Hash functions vary in thedomain of their inputs and the range of their outputs and in howpatterns and similarities of input data affect output data. Hashfunctions are used in hash tables, cryptography, data processing, etc.

Kerberos: is a computer network authentication protocol, which allowsindividuals communicating over an insecure network to prove theiridentity to one another in a secure manner.

MD5: Short for Message-Digest algorithm 5 is a widely-used cryptographichash function with a 128-bit hash value. As an Internet standard, MD5has been employed in a wide variety of security applications, and isalso commonly used to check the integrity of files. MD5 digests arewidely used in the software world to provide some assurance that adownloaded file has not been altered. A user can compare a publicizedMD5 sum with the checksum of a downloaded file. On the assumption thatpublicized checksum can be trusted to be authentic, a user can haveconsiderable confidence that the file is the same as that released bythe developers, protecting against Trojan horses and computer virusesbeing added to the software surreptitiously.

Plaintext: Plaintext is information used as input to an encryptionalgorithm; the output is termed ciphertext. The plaintext could be, forexample, a diplomatic message, a bank transaction, an email, a diary andso forth—any information that someone might want to prevent others fromreading. Plaintext is typically human readable, either directly or withsome commonly available device, such as a Compact Disk player. In somesystems, however, multiple layers of encryption are used, in which casethe ciphertext output of one encryption algorithm becomes the plaintextinput to the next.

Polyalphabetic Cipher: is any cipher based on substitution, usingmultiple substitution alphabets. For example, in a Caesar cipher (one ofthe various Polyalphabetic ciphers) each letter of the alphabet isshifted along some number of places; for example, in a Caesar cipher ofshift 3, A would become D, B would become E and so on.

Pseudo-column: Pseudo-columns are not actual columns in a table, butvalues can be selected from them. Row-ID (identification)—the binaryaddress of a row in a database—is an example to a Pseudo-column.

Session: In computer science a session is either a lasting connectionusing the session layer of a network protocol or a lasting connectionbetween a user (or user agent) and a peer, typically a server, usuallyinvolving the exchange of many packets between the user's computer andthe server. A session is typically implemented as a layer in a networkprotocol (e.g. telnet, FTP).

SSL: Short for Secure Sockets Layer. SSL provides endpointauthentication and communications privacy over the Internet usingcryptography. In typical use, only the server is authenticated (i.e. itsidentity is ensured) while the client remains unauthenticated. Theprotocols allow client/server applications to communicate in a waydesigned to prevent various attacks.

TSL: Short for Transport Layer Security, a protocol that guaranteesprivacy and data integrity between client/server applicationscommunicating over the Internet. The TLS protocol is made up of twolayers: (1) The TLS Record Protocol—layered on top of a reliabletransport protocol, such as TCP, it ensures that the connection isprivate by using symmetric data encryption and it ensures that theconnection is reliable. The TLS Record Protocol also is used forencapsulation of higher-level protocols, such as the TLS HandshakeProtocol. (2) The TLS Handshake Protocol—allows authentication betweenthe server and client and the negotiation of an encryption algorithm andcryptographic keys before the application protocol transmits or receivesany data. TLS is application protocol-independent. Higher-levelprotocols can layer on top of the TLS protocol transparently. TLSsupersedes and is an extension of SSL.

Vernam cipher: Vernam cipher (also known as “The one time pad”) uses akeyword as a key and is secure, as long as the keyword is never usedagain. It is a symmetric polyalphabetic cipher. One picks a keyword andthen adds on each letter to a corresponding letter of the plaintext. Thedecryption is done using the same key, but subtracting the key lettervalue from the corresponding letter of the ciphertext. The plaintext cannot be longer than the key. A key which is used more than once reducesthe one time pad to a Vigenere cipher, which is much easier to break.

Web browser: Web Browser is a software package that enables a user todisplay and interact with documents hosted by web servers.

XOR: Exclusive disjunction (usual symbol XOR or CD) is a logicaloperator that results in true if one of the operands, but not both ofthem, is true.

BACKGROUND OF THE INVENTION

Database is an integral part of almost every information system. The keyfeatures databases propose are shared access, minimal redundancy, dataconsistency, data integrity and controlled access. The case wheredatabases hold critical and sensitive information is quite common,therefore an adequate level of protection to database content has to beprovided.

Database security methods can be divided into four layers:

-   -   physical security;    -   operating system security;    -   DBMS (Database Management System) security; and    -   data encryption.

The first three layers alone are not sufficient to guarantee thesecurity of the database since the database data is kept in a readableform. Anyone having access to the database including the DBA (DatabaseAdministrator) is able to read the data. In addition, the data isfrequently backed up so access to the backed up data also needs to becontrolled. Moreover, a distributed database system makes it harder tocontrol disclosure of the data.

The secure transmission of data and user authentication has been wellstudied and incorporated into today's e-business market. Almost all Webbrowsers and servers support SSL (Secure Socket Layer) or TSL (TransportSocket Layer) so, for example, a credit card number is protected on itsway to the Web server. Vendors, such as VeriSign® supply services ofthird party authentication. Before creating a secured channel, forexample SSL channel, Web browsers authenticate the destination addressby verifying the authenticity of the Web server's certificate. However,once the data arrives securely at the certified server support instoring and processing the data in a secure way is inadequate.

Security and privacy aspects of private data stored on a data storageserver have recently become an interesting and challenging field ofresearch. Encryption is a well-established technology for protectingsensitive data. Anyone having access to the encrypted data cannot learnanything about the sensitive data without the encryption key.Furthermore, encryption can be used to maintain data integrity so thatany unauthorized changes of the data can easily be detected.

There are three general approaches for considering integratingcryptography into databases:

-   -   The first approach is called “loose coupling”. In this approach,        the server implements pre-defined cryptographic services        installed on the database server. One example is an encryption        package that is stored on the database server and encrypts the        newly inserted database content using the user supplied        encryption key.    -   The second approach is called “tight coupling”. In this approach        a new set of cryptographic services are added to the DB as new        SQL statements together with the necessary control and execution        context ensures that the new SQL queries are executed securely.        This approach is a harder task to implement than the previous        one, since changes have to be performed in core database        software.    -   The third approach is a mixture of both approaches where some        changes are implemented as new SQL statements while most of the        changes are integrated into the database as stored procedures        built over the new set of SQL statements.

The three approaches described above consider encryption to be performedin the database server. Thus, the database server is assumed to betrusted.

Database Encryption Methods

Database encryption can be implemented at different levels: tables,columns, rows and cells. Encrypting the whole table, column or rowentails the decryption of the whole table, column or row respectivelywhen a query is executed. Therefore, an implementation which decryptsonly the data of interest is preferred.

Several database encryption methods have been proposed. For example, adatabase encryption method presented in U.S. Pat. No. 4,375,579 (on thebasis of this patent was publicized an article “A Database EncryptionSystem with Subkeys” by Davida G. I., Wells, D. L. and Kam J. B.) isbased on the Chinese-Reminder theorem where each row is encrypted usingdifferent sub-keys for different cells. This method enables encryptionat the level of rows and decryption at the level of cells. However, U.S.Pat. No. 4,375,579 has a number of significant disadvantages:

-   -   a. It relays on a specific encryption function and not on any        symmetric or asymmetric encryption function.    -   b. Each encrypted record is a single function of all of its        field values and each field is encrypted with a separate        encryption key. In order to perform an update operation, all        field values must be known. This means that only by having all        the encryption keys any change can be made to a record. Updates        can be performed only at secure periods when all of the        encryption keys are accessible to the DBMS.    -   c. In order to perform management operations, such as adding or        deleting a column, all of the encryption keys for that column        have to be accessed and the values have to be decrypted        (Deleting of adding a column has immediate affect on all of the        fields in all of the records in the table).    -   d. Needs a special mechanism for updates that could only be        performed during secure periods. After each update, each row can        not be accessed until it is re-encrypted, since the selected        values are not the updated values. In order to select specific        fields, the entire record has to be retrieved in order to        decrypt the above specific fields.

Another database encryption method presented in “Multilevel SecureDatabase Encryption with Subkeys” by Min-Shiang, H., and Wei-Pang, Y.extends the encryption method presented in U.S. Pat. No. 4,375,579 bysupporting multilayer access control. It classifies subjects and objectsinto distinct security classes which are ordered in a hierarchy suchthat an object with a particular security class can be accessed only bysubjects in the same or a higher security class. In this method, eachrow is encrypted with sub-keys according to the security class of itscells. Still another database encryption method presented in “ACryptographic Mechanism for Sharing Databases” by Buehrer, D., andChang, C. proposes an encryption method for a database based on Newton'sinterpolating polynomials. One disadvantage of all the above methods isthat the basic element in the database is a row and not a cell, thus thestructure of the database is modified. In addition, all of those methodsrequire re-encrypting the entire row when a cell value is modified.

A further database encryption method presented in “A Database RecordEncryption Scheme Using RSA Public Key Cryptosystem and Its Master Keys”by Chang, C. C., and Chan, C. W. is based on the RSA public-key methodand suggests two database encryption methods: one field oriented and theother record oriented. Both of the suggested methods support distinctionbetween write and read access rights. The disadvantage of the fieldoriented encryption method is that it is not resistant to substitutionattacks trying to substitute two encrypted cells. The disadvantage ofthe record oriented method is similar to the one of the record orientedencryption methods discussed above. Still further encryption methodprovided in “Practical Techniques for Searches on Encrypted Data” bySong, D. X., Wagner, D., and Perrig, A. suggests computing the bitwiseexclusive or (XOR) of the plaintext values with a sequence ofpseudo-random bits generated by the client according to the values ofthe plaintext value and a secure encryption key. This method supportssearches over the encrypted data without revealing anything about theplaintext values except the locations of the searched plaintext.However, the proposed method does not protect from attacks thatsubstitute two encrypted values in the database and requires querytranslation since the pseudo-random bits for a value searched need to becomputed by the client.

Still a further encryption method presented in “GBDE-GEOM Based DiskEncryption Source” by Kamp, P. H. suggests encrypting the entirephysical disk allowing the database to be protected. One of thedisadvantages of that method is that the DBA can perform noadministrative tasks on the database, since the entire content of thedatabase is encrypted.

Therefore, it is an object of the present invention, to provide a simpleand efficient method and system for database encryption, overcoming theshortcomings of the prior art database encryption methods.

It is another object of the present invention, to suggest how to encryptthe entire content of the database without changing its structure.

It is still another object of the present invention, to allow the DBA tocontinue managing the database without being able to view or manipulatethe database content.

It is still another object of the present invention, to provide a methodand system for database encryption, wherein anyone gaining access to thedatabase can not learn anything about its content or tamper the data,unnoticed, without the encryption key.

It is a further object of the present invention to provide a method andsystem decrypting only the data of interest.

It is still a further object of the present invention to provide amethod and system for database encryption, wherein the structure of thedatabase tables and indexes remains as before encryption.

It is still a further object of the present invention to provide amethod and system for database encryption, wherein queries are notchanged because of the encryption.

It is still a further object of the present invention to provide amethod and system for database encryption, ensuring that existingapplications can use the encrypted database without the need for anychanges in the application software.

It is still a further object of the present invention to provide amethod and system for secure database indexing, protecting againstinformation leakage and unauthorized modifications.

It is still a further object of the present invention to provide amethod and system for secure database indexing supporting discretionaryaccess control in a multi-user environment.

Other objects and advantages of the invention will become apparent asthe description proceeds.

Indexing Encrypted Databases

The conventional way to provide an efficient execution of databasequeries is using indexes. Indexes in an encrypted database raise thequestion of how to construct the index so that no information about thedatabase content is revealed.

Increasingly, organizations and users prefer to outsource their datacenter operations to external application providers. As a consequence ofthis trend toward outsourcing, highly sensitive data is now stored onsystems that are not under the data owner control. While data owners maynot entirely trust providers' discretion, preventing a provider frominspecting data stored on their own machines is difficult. For this kindof service to work successfully it is of primary importance to providemeans of protecting the secrecy of the information remotely stored,while guaranteeing its availability to legitimate clients.

Communication between the client and the database service provider canbe secured through standard means of encryption protocols such as SSL(Secure Socket Layer). With regard to the stored data security, accesscontrol has proved to be useful, provided that data is accessed usingthe intended system interfaces. However, access control is useless ifthe attacker simply gains access to the raw database data, thusbypassing the traditional mechanisms. This kind of access can easily begained by insiders, such as the system administrator and the databaseadministrator (DBA).

Database encryption introduces an additional layer to conventionalnetwork and application security solutions, and prevents exposure ofsensitive information even if the raw data is compromised. Databaseencryption prevents unauthorized users from viewing sensitive data inthe database and, it allows database administrators to perform theirtasks without having access to sensitive information. Furthermore, itprotects data integrity as unauthorized modifications can easily bedetected.

A common technique to speed up queries execution in databases is to usea pre-computed index, as described in “Database Management Systems” byRamakrishnan, R. and Gehrke, J. However, once the data is encrypted, theuse of standard indexes is not trivial and it depends on the encryptionfunction used. Most encryption functions preserve equality thus, Hashindexes can be used, but information, such as the frequencies of indexedvalues is revealed. Most encryption functions do not preserve orderthus, B-Tree indexes, can no longer be used once the data is encrypted.

Furthermore, if several users with different access rights use the sameindex, each one of them needs access to the entire index, possibly toindexed elements, which are beyond his access rights. Google™ Desktop,as an example to this problem, allows indexing and searching personalcomputers data. Using this tool, a legitimate user, is able to bypassuser names and passwords, and view personal data of other users who usethe same computer, since it is stored in the same index.

Indexes are mostly structured as trees and which can reveal the order ofthe indexed nodes (by browsing the ordered leafs). This information canbe exploited to estimate the value of a particular encrypted node sincethe relative position of the encrypted node within the ordered set ofnodes can imply the plaintext value of this node. In addition, thereferences to the positions of a particular indexed value may allowvarious statistical attacks on the indexed values. Even if thereferences to the indexed values are secured, a change to the indexafter an insert to the database provides the potential attacker withvaluable information (an attacker could correlate the new value insertedto the index with the new value inserted to the database and thus revealthe reference for that value).

Several methods for encrypted indexing have been proposed in the past.For example, an indexing method provided in “Executing SQL OverEncrypted Data in the Database-Service-Provider Model” by Hacigumus, H.,Iyer, B., Li, C., and Mehrotra, S. is based on encrypting the wholedatabase row and assigning a set identifier to each value in this row.When searching a specific value, its set identifier is calculated andthen passed to the server who in turn returns to the client a collectionof all rows with values assigned to the same set. Finally, the clientsearches the specific value in the returned collection and retrieves thedesired rows. In this method, equal values are always assigned to thesame set, thus some information is revealed when applying statisticalattacks. Using this approach requires more computation by the clientsince the result of the queries is not accurate. Furthermore, the sizesof the buckets assigned to the same set are also a matter to beconsidered.

Another indexing method provided in “A Framework for Efficient StorageSecurity in RDBMS” by Iyer, B., Mehrotra, S., Mykletun, E., Tsudic, G.,and Wu, Y. is based on constructing the index on the plaintext valuesand encrypting each page separately. Whenever a specific page of theindex is needed for processing a query, it is loaded into memory anddecrypted.

Since the uniform encryption of all pages is likely to provide manycipher breaking clues, still another indexing method provided in“Chip-secured data access: Confidential Data on Untrusted Servers” byBouganim, L., and Pucheral, P. suggests encrypting each index page usinga different key depending on the page number.

However, the above methods described in “A Framework for EfficientStorage Security in RDBMS” by Iyer, B., Mehrotra, S., Mykletun, E.,Tsudic, G., and Wu, Y., and “Chip-secured data access: Confidential Dataon Untrusted Servers” by Bouganim, L., and Pucheral, P. implemented atthe level of the operating system are not satisfactory since in mostcases it is not possible to modify the operating system implementation.Furthermore, in these methods, it is not possible to encrypt differentportions of the database using different keys.

A further indexing method suggested by Boneh, D., Crescenzo, G. D.,Ostrovsky, R., and Persiano, G. in “Public Key Encryption with KeywordSearch” constructs a mechanism enabling the server searching forpre-defined key words within a document using a special “trapdoor”supplied by the user for that keyword. Apart from the key word, themethod reveals nothing about the document. However, the above methoddoes not support range queries and query translation has to be performedsince the client has to compute the “trapdoor” from each keywordsearched.

The major drawback of the last two methods is that there is no supportin indexes structured as trees since the server can only perform exactmatches to the user's query and thus lacks the ability to evaluate therelation between two tree nodes in the index.

Assuming the index is implemented as a B+-Tree, encrypting each of itsfields separately would reveal the ordering relationship between theencrypted values.

Still a further indexing method suggested in “Order PreservingEncryption for Numeric Data” by Agrawal, R., Kiernan, J., Srikant, R.,and Xu, Y. builds the index over the data encrypted using an encryptionmethod called OPES (Order Preserving Encryption Scheme). OPES allowscomparison operations to be applied directly to the encrypted data.However, revealing the order of the encrypted values is not acceptablefor any application.

Still a further indexing method provided in “Balancing Confidentialityand Efficiency in Untrusted Relational DBMSs” by Damiani, E., DeCaptiani Divimercati, S., Jajodia, S., Paraboschi, S., and Samarati, P.suggests encrypting each node of the B+-Tree as a whole. However, sincereferences between the B+-Tree nodes are encrypted together with theindex values, the index structure is concealed, and therefore the DBAfinds the index unmanageable.

The Attacker Model

The attacker can be categorized into three classes: Intruder—a personwho gains access to a computer system and tries to extract valuableinformation. Insider—a person who belongs to the group of trusted usersand tries to get information beyond his own access rights.Administrator—a person who has privileges to administer a computersystem, but uses his administration rights in order to extract valuableinformation. All of the above attackers can use different attackstrategies: Direct storage attacks—attacks against storage may beperformed by accessing database files following a path other thanthrough the database software, by physical removal of the storage mediaor by access to the database backup disks. Indirect Storage attacks—anadversary can access schema information, such as table and column names,metadata, such as column statistics, and values written to recovery logsin order to guess data distributions. Memory attacks—an adversary canaccess the memory of the database software directly (The last one isusually protected by the Hardware/Operation System level).

When selecting the right approach for indexing encrypted databases, thefollowing aspects should be considered:

-   -   a. Information Leakage—a secure index in an encrypted database        should not reveal any information on the database plaintext        values. The possible information leaks are: Static        leakage—Gaining information on the database plaintext values by        observing a snapshot of the database at a certain time. For        example, if the index is encrypted in a way that equal plaintext        values are encrypted to equal ciphertext values, statistics        about the plaintext values, such as their frequencies can easily        be learned. Linkage leakage—Gaining information on the database        plaintext values by linking a database value to its position in        the index. For example, if the database value and the index        value are encrypted in the same way (both ciphertext values are        equal), an observer can search the database ciphertext value in        the index, determine its position and estimate its plaintext        value. Dynamic leakage—Gaining information about the database        plaintext values by observing and analyzing the changes        performed in the database over a period of time. For example, if        a user monitors the index for a period of time, and if in this        period of time only one value is inserted (no values are updated        or deleted), the observer can estimate its plaintext value based        on its position in the index.    -   b. Unauthorized Modification—In addition to the passive attacks        that monitor the index, active attacks that modify the index        should also be considered. Active attacks are more problematic,        in the sense that they may mislead the user. For example,        modifying index references to the database rows may result in        queries returning erroneous set of rows, possibly benefiting the        adversary. Unauthorized modifications can be made in several        ways: Spoofing—Replacing a ciphertext value with a generated        value; Splicing—Replacing a ciphertext value with a different        ciphertext value; Replay—Replacing a ciphertext value with an        old version previously updated or deleted.    -   c. Structure Perseverance—When applying encryption to an        existing database, it would be desirable that the structure of        the database tables and indexes is not modified during the        encryption. This ensures that the database tables and indexes        can be managed in their encrypted form by a database        administrator as usual, while keeping the database contents        hidden. For example, if a hash index is used and the values        therein do not distribute equally, performance might be        undermined, and the DBA might wish to replace the hash function.        In such a case, the DBA needs to know structure information,        such as the number of values in each list, but does not need to        know the values themselves.    -   d. Performance—Indexes are used in order to speed up queries        execution. However, in most cases, using encrypted indexes        causes performance degradation due to the overhead of        decryption. Indexes in an encrypted database raise the question        of how to construct the index so that no information about the        database content is revealed, while performance in terms of time        and storage is not significantly affected.

Discretionary Access Control (DAC)

In a multi-user (discretionary) database environment each user onlyneeds access to the database objects (e.g., group of cells, rows andcolumns) needed to perform his job. Encrypting the whole database usingthe same key, even if access control mechanisms are used, is not enough.For example, an insider who has the encryption key and bypasses theaccess control mechanism can access data that are beyond his securitygroup. Encrypting objects from different security groups using differentkeys ensures that a user who owns a specific key can decrypt only thoseobjects within his security group. Following this approach, differentportions of the same database column might be encrypted using differentkeys. However, a fundamental problem arises when an index is used forthat column. In this case each one of the users, who belong to differentsecurity groups using different keys, needs access to the entire index,possibly to indexed elements, which are beyond their access rights. Thesame problem arises when the index is updated.

Key Management in Database Encryption Methods

Databases contain information of different levels of sensitivity thathave to be selectively shared between large numbers of users. Encryptingeach column with a different key, results in a large number of keys foreach legitimate user. However, using the approach proposed in “Secureand Selective Dissemination of XML Documents” by Bertino, E., andFerrari, E. can reduce the number of keys. It is shown how the smallestelements which can be encrypted using the same key according to theaccess control policy can be found. Thus, the keys are generatedaccording to the access control policy in order to keep their numberminimal. This approach can be incorporated in the proposed method toencrypt sets of columns with the same key in accordance with thedatabase access control policy. The dynamic nature of encrypteddatabases adds complexity and special requirements to the key managementprocess. However, “Secure and Selective Dissemination of XML Documents”by Bertino, E., and Ferrari, E. does not deals the database encryptionproblems.

Key management in encrypted databases can be preformed at five differentlevels:

-   -   a. keys can be created on a database level; this implies that        the whole database is encrypted using the same key, thus, users        gaining access to the encryption key can access the whole        database;    -   b. keys can be created on a table level; each table will be        encrypted using (possibly) a different key, and a user that        gaining access to one of the encryption keys can access all        tables encrypted using that key;    -   c. keys can be created in vertical-partitions-levels; in this        case, each row can be encrypted using a different key;    -   d. keys can be created on a column level; this enables each        column to be encrypted using a different key; and    -   e. keys can be created on a cell level; this enables maximal        freedom when enforcing the access control policy by encryption        but introduces difficulties when managing key updates, data        manipulations and changes to the access control policy.

There are three different approaches to the encryption keys storage:

-   -   a. Storing the encryption keys at the server side—The server has        full access to the encryption keys. All computation is performed        at the server side.    -   b. Storing encryption keys at the client side—The client never        transfers the keys to the server and is responsible for        performing all encryption and decryption operations. Where the        database server has no access to the encryption keys, no        computations can be performed at the server side since they        entail revealing the database values.    -   c. Keys per session—The database server has full access to the        encryption keys during the session but does not store them on        disk. This ensures that the user transaction can be performed        entirely at the server side, during the session. However, since        the keys are never kept in the database server after a session        terminates, an attacker cannot learn anything about the database        values as he has no access to the encryption keys.

If the database server (e.g., database service provider) is not trusted,it is preferred that the database server would not be able to learnanything about the stored data, and thus the keys are kept only at theclient side. In cases when the database server is fully trusted, exceptfor its physical storage (e.g., external storage provider, backup tapesstored in an untrusted location), the keys can be stored at the serverside in some protected region.

The Desired Properties of a Database Encryption Method

According to “A Database Encryption System with Subkeys” by Davida, G.I., Wells, D. L., and Kam, J. B. a database encryption method shouldmeet the following requirements:

-   -   security—it is mandatory that the encryption method should be        either theoretically or computationally secure (require a high        work factor to break it) as it is the only guarantee for data        security especially in cases where the database is stored in an        untrusted site;    -   performance—encryption and decryption should be fast enough so        as not to degrade system performance (not affect the complexity        of the database operations);

data volume—the encrypted data should not have a significantly greatervolume than the unencrypted data; the space complexity of the databasestorage before and after applying the encryption method should remainthe same;

-   -   decryption granularity—in order to support efficient random        access, the encryption method should support the decryption of        single database records without the need to access other        records; moreover, database records should be independent of        other records since the DBMS may rearrange records in any given        time (e.g., sort table files for matters of performance, solve        fragmentation problems);    -   encrypting different columns under different keys—this should be        supported; different users have different access rights and the        encryption method should support the enforcement of access        rights using encryption;    -   patterns matching and substitution attacks—the encryption method        should protect against attacks that use patterns matching and        substitution of encrypted values; any unauthorized substitution        should be detected at decryption time;    -   unauthorized access detection—modified data by an unauthorized        user should be noticed at decryption time; and    -   maintain database structure—the security mechanism should be        flexible and not entail any change in the structure of the        database. The structure of the database refers to two main        aspects: (a) the internal database files and algorithms        representing the implementation of the DBMS, (b) the SQL queries        together with all the interface commands used in order to        manipulate and retrieve data. Preferably applying the new        encryption method should not entail any changes to the internal        representation or implementation of the database or change the        way the user interacts with the DBMS.

A naive approach for database encryption is to encrypt each cellseparately. This approach has several drawbacks.

First, two equal plaintext values are encrypted to equal ciphertextvalues. Therefore, it is possible, for example, to collect statisticalinformation as to how many different values a specified column currentlyhas. The same holds for the ability to execute a join operation betweentwo tables and collect information from the results.

Second, it is possible to switch unnoticed between two ciphertextvalues. Different ciphertext values for equal plaintext values can beachieved using a polyalphabetic cipher, for example Vernam cipher.However, in this solution decryption of a record depends on otherrecords and thus requirement of decryption granularity described aboveis violated.

Encryption Granularity

Table/Index encryption can be performed at various levels ofgranularity: single values, records/nodes, pages or whole table/index.When choosing the level of granularity, the following should beconsidered:

-   -   a. Information Leakage—The higher the level of encryption        granularity, the less information is revealed. Single values        level encryption of the table/index reveals sensitive        information, such as frequencies of the table/index values.        Whole Index level encryption ensures that information about the        data cannot be leaked, since it is encrypted as one unit.    -   b. Unauthorized Modifications—Encryption at higher levels of        granularity makes it harder for the attacker to tamper with the        data. Single values level encryption of the table/index allows        an attacker to switch two ciphertext values without being        noticed. Whole table/index level encryption implies that a minor        modification to the encrypted table/index has a major effect on        the plaintext table/index and can easily be detected.    -   c. Structure Perseverance—Higher levels of encryption        granularity conceal the table/index structure. Whole table/index        level encryption changes the structure of the index, since the        basic element of reference is changed from a single value to the        entire table/index. Single values level encryption of the        table/index preserves its structure.    -   d. Performance—Finer encryption granularity affords more        flexibility in allowing the server to choose what data to        encrypt or decrypt. Whole table/index level encryption requires        the whole table/index to be decrypted, even if a small number of        table/index nodes are involved in the query. Single values level        encryption of the table/index enables decryption of values of        interest only.

Better performance and preserving the structure of the database can notbe achieved using pages or whole table/index encryption granularity.However, special techniques can be used in order to cope withunauthorized modifications and information leakage, when single valuesor records/nodes granularity encryption are used.

Hereinafter, it is assumed that the encryption keys are kept per sessionand that the table and index are encrypted at the single values level ofgranularity.

SUMMARY OF THE INVENTION

The present invention relates to Structure Preserving DatabaseEncryption (SPDE) method and system, wherein no one with access to theencrypted database can learn anything about its content without theencryption key. Also a secure index for an encrypted database isprovided. Furthermore, secure database indexing system and method aredescribed, providing protection against information leakage andunauthorized modifications by using encryption, dummy values andpooling, and supporting discretionary access control in a multi-userenvironment.

The Structure Preserving Database Encryption system for a databaseencryption, comprises: (a.) a client for: (a.1.) receiving one or moreencryption keys, according to the client's access right definition;(a.2.) generating a session; (a.3.) transferring to said database serversaid one or more encryption keys; and (a.4.) generating at least onequery; and (b.) an authentication server for identifying said client andtransferring to him said one or more encryption keys; and (c.) adatabase server for: (c.1.) communicating with said client by means ofsaid session generated by said client; (c.2.) searching an encrypteddatabase for the corresponding data requested in said at least onequery; (c.3.) after finding said corresponding data, decrypting saidcorresponding data by means of said one or more encryption keys; and(c.4.) transferring the results of said at least one query to saidclient.

The Structure Preserving Database Encryption method for a databaseencryption, comprises: (a.) identifying a client by means of anauthentication server communicating over a conventional identificationprotocol; (b.) receiving one or more encryption keys from saidauthentication server by the client, said one or more encryption keysbeing relevant for performing at least one query of said client,according to the client's access right definition; (c.) generating asession by means of said client with a database server; (d.)transferring from said client to said database server the correspondingone or more encryption keys received from said an authentication server;(e.) generating said at least one query by the client; (f.) searching bymeans of said database server an encrypted database for thecorresponding data requested in said at least one query; (g.) afterfinding said corresponding data, decrypting said corresponding data bymeans of said one or more corresponding encryption keys; and (h.)transferring the results of said at least one query from said databaseserver to said client.

The Structure Preserving Database Encryption method for a databaseencryption, said database consisting of at least one table having one ormore rows, columns and cells, comprising the steps of the encryption ofeach cell value: (a.) determining a value stored in a correspondingcell; (b.) determining the position of said cell within a database bydetermining said cell table, row and column identifiers; (c.) activatinga function concatenating said cell table, row and column identifiers andas a result of said concatenating obtaining a number based on saididentifiers; (d.) performing a XOR operation between said number andsaid value stored in said cell or concatenating said number with saidvalue stored in said cell; and (e.) activating an encryption function ona result obtained from said XOR operation or said concatenating of saidnumber with said value stored in said cell.

Preferably, the Structure Preserving Database Encryption method furthercomprises: (a.) activating a hash function on the result of theconcatenating and as a result obtaining another number based on the celltable, row and column identifiers; (b.) performing a XOR operationbetween said another number and the value stored in the cell orconcatenating said another number with said value stored in said cell;and (c.) activating an encryption function on a result obtained fromsaid XOR operation or the concatenating of said another number with saidvalue stored in said cell.

Preferably, the Structure Preserving Database Encryption method furthercomprises the steps of the decryption of each cell value: (a.)activating on an encrypted value a decryption function which decryptssaid encrypted value and as a result a decrypted value is obtained; and(b.) performing a XOR operation between said decrypted value and thenumber obtained as the result of the concatenating the cell table, rowand column identifiers.

Preferably, the Structure Preserving Database Encryption method furthercomprises the steps of the decryption of each cell value: (a.)activating on an encrypted value a decryption function which decryptssaid encrypted value and as a result a decrypted value is obtained; and(b.) performing the XOR operation between said decrypted value andanother number obtained as the result of activating the hash function orperforming discarding said another number from said decrypted value.

The method for database encryption, wherein said database comprise anindex consisting of values of at least one table having one or morerows, columns and cells, said method comprises the steps of theencryption of each index entry: (a.) determining a value stored in acorresponding cell; (b.) concatenating said value stored in said cellwith a random number having a fixed number of bits or concatenating saidvalue stored in said cell with a row identifier of said cell; and (c.)activating an encryption function on a result obtained from saidconcatenating.

Preferably, the method for database encryption, wherein said databasecomprise an index consisting of values of at least one table having oneor more rows, columns and cells, said method further comprises the stepsof the encryption of each index entry: (a.) obtaining an internalpointer to index entries; (b.) obtaining an external pointer to acorresponding row in a table wherein said value is stored; (c.)encrypting said external pointer by means of a conventional encryptionfunction; and (d.) activating an authentication code function, saidauthentication code function: (d.1.) concatenating together: (i.) thevalue stored in the corresponding cell; (ii.) said internal pointer toindex entries; (iii.) said external pointer said corresponding row inthe table wherein said value is stored; and (iv.) an entry self-pointer;and (d.2.) calculating a message authentication code value from saidconcatenating.

Preferably, the method for database encryption, wherein said databasecomprise an index consisting of values of at least one table having oneor more rows, columns and cells, said method further comprises: (a.)defining a fixed size pool for each index, said pool holding one or morevalues for inserting into the corresponding index; and (b.) updatingsaid each index with corresponding said one or more values only if saidpool is full.

Preferably, the method for database encryption, wherein said databasecomprise an index consisting of values of at least one table having oneor more rows, columns and cells, said method further comprisesextracting corresponding values from the corresponding pool to thecorresponding index in a random order.

A method for executing a client's query in an encrypted-index database,by means of a database server using sub-indexes, comprises: (a.)connecting to a database server by means of a client and identifyingsaid client; (b.) creating a secure session between said database serverand said client; (c.) transferring one or more encryption keys by meansof said client to said database server; (d.) submitting a query by meansof said client to said database server; (e.) locating correspondingsub-indexes which said client is entitled to access; (f.) executing saidquery on said corresponding sub-indexes by means of said database serverusing said one or more encryption keys; and (g.) transferring a resultof said query to said client.

BRIEF DESCRIPTION OF THE DRAWINGS

In the drawings:

FIG. 1 illustrates the security perimeter in the DAS model, according tothe prior art;

FIG. 2 is a schematic illustration of the system architecture of theStructure Preserving Database Encryption (SPDE) method, according to apreferred embodiment of the present invention;

FIG. 3 illustrates a database encryption method, according to the priorart;

FIG. 4 discloses a database encryption employing a Structure PreservingDatabase Encryption (SPDE) method, wherein the structure of the databasetables and indexes remain as before encryption, according to a preferredembodiment of the present invention;

FIG. 5 is a schematic illustration of a database and index encryption,according to a preferred embodiment of the present invention;

FIG. 6A and FIG. 6B are schematic illustrations of a database Indexusing pooling, according to a preferred embodiment of the presentinvention;

FIG. 7 illustrates the use of sub-indexes, according to a preferredembodiment of the present invention; and

FIG. 8 illustrates how a query is executed using sub-indexes, accordingto a preferred embodiment of the present invention.

It will be appreciated that for simplicity and clarity of illustration,elements shown in the figures have not necessarily been drawn to scale.For example, the dimensions of some of the elements may be exaggeratedrelative to other elements for clarity. Further, where consideredappropriate, reference numerals may be repeated among the figures toindicate corresponding or analogous elements.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

The recent explosive increase in Internet usage, together with advancesin software and networking, has resulted in organizations being able toshare data for a variety of purposes easily. This has led to newparadigm “Database as a service” (DAS) in which the whole process ofdatabase management is outsourced by enterprises in order to reducecosts and, to concentrate on the core business.

FIG. 1 illustrates the security perimeter in the DAS model, according tothe prior art. Client 100 performs encryption and decryption operationswithin the security perimeter 101 while the database server 110, notbeing trusted, remains outside the security perimeter. In such caseswhere the database server 110 is not trusted, the process of encryptioncan not be performed by said server 110. Defining the encryption methodunder the assumption that server 110 is not trusted raises manyquestions. One core issue is query processing in the DAS model since thedata is stored encrypted and the server has no access to the encryptionkeys. One way to implement a query in the DAS model is to transfer thedata from the untrusted servers to the security perimeter 101. Onceinside the security perimeter 101, data can be decrypted and the queryprocessed. However, not only this approach is not practical for largedatabases, it also implies that only the storage is outsourced.Furthermore, the server is expected to be able to perform databaseoperations, such as checking constrains, building indexes, ensuringconsistency and executing queries.

FIG. 2 is a schematic illustration of the system architecture of theStructure Preserving Database Encryption (SPDE) method, according to apreferred embodiment of the present invention. Client 202 generates SQLcommands (queries) and receives results to said queries from databaseserver 203. Client 202 is responsible for generating a session andtransferring encryption keys to Database Server 203. The encryption keysare used during the generated session by means of Database server 203for encryption and decryption operations needed for performing queriesof Client 202. Database server 203 is used for performing SQL commandsby means of Database Management System (DBMS) 210, said commandsreceived from Client 202 by use of encryption keys also received fromsaid Client 202. Encryption database 215 comprises the encrypted data.Authentication Server 201 comprises the encryption keys of Client 202.Client 202 wishing to perform queries from Database server 203 has to beidentified by Authentication Server 201 in order to receive theencryption keys. After Client 202 was identified by AuthenticationServer 201, the encryption keys are transferred from said AuthenticationServer 201 to Client 202. Then Client 202 transfers the encryption keysto Database Server 203.

It should be noted, that Client 202 according to all preferredembodiment of the present invention, refers to a computer and/or to aperson.

At step 221, Client 202 identifies itself to Authentication Server 201by means of a conventional identification protocol, such as Kerberos.After Client 202 was identified by Authentication Server 201, at step222 Client 202 receives the encryption keys, which are relevant forperforming said Client 202 queries, according to said Client 202 accessright definition. Each client can have different encryption keysaccording to his access right definition for accessing various datatables stored in Database Server 203. Client 202 wishing to access datato which he does not have a corresponding encryption key, is not able todecrypt said data, since he does not have an encryption key by use ofwhich said data was encrypted. Then at step 223, Client 202 generates asession with Database Server 203 and transfers to said Database Server203 the corresponding encryption keys, which are used by Database Server203 for performing queries received from Client 202. At step 224, Client202 generates a query (an SQL at least one command is sent to DatabaseServer 203). At step 225, Database Server 203 searches EncryptedDatabase 215 for the corresponding data requested in the above query,and after such data is found, said data is decrypted by means of thecorresponding encryption keys. The results of the above query aretransferred from Database Server 203 to said Client 202 in non-encryptedform, and therefore Client 202 does not need to perform any decryptionoperation on said results.

The system architecture of FIG. 2 ensures that, after a secure sessionis created, the whole process of encryption and decryption performed byDatabase Server 203 is transparent to Client 202.

The system and method, according to a preferred embodiment of thepresent invention, assumes that the database server is trusted. That is,all encryption and decryption operations will be performed on theserver. In order to perform these operations, all the necessaryencryption keys should be accessible to the server during the validsession a logged-on user. These keys should be retained only in theserver's memory during the session. The encryption method introduces anew line of defense for “data at rest”: a DBA managing the database hasno access to any of the encryption keys, and learns nothing about thedatabase values. Furthermore, an intruder managing to break into thedatabase and read the stored data cannot learn anything about thedatabase values. Moreover, when the data is backed up, only theencrypted form of the data is stored on the backup site, thus the datais secured against data disclosure.

FIG. 3 illustrates a database encryption method, according to the priorart. A table 300 has, for example, one data column “C” numbered 302 andten Rows (a column showing the identifiers of rows is numbered 301). Atable 310, which is the encryption of table 300, also has, for example,one data column “CC” numbered 312 and ten Rows (a column showing theidentifiers of rows is numbered 311). The equal plaintext values intable 300 are encrypted to the corresponding equal ciphertext values intable 310. For example, cells 303, 304 and 305 in table 300 have equalvalues of “16”. As a result, in table 310 the corresponding cells 313,314 and 315 also have equal ciphertext values “#$”. Therefore, thisprior art method is sensitive to substitution attacks, attempting toswitch encrypted values and to patterns matching attacks, attempting togather statistics based on the database encrypted values.

FIG. 4 discloses a database encryption employing a Structure PreservingDatabase Encryption (SPDE) method, wherein the structure of the databasetables and indexes remain as before encryption, according to a preferredembodiment of the present invention. A table 300 has, for example, onedata column “C” numbered 302 and ten Rows (a column showing theidentifiers of rows is numbered 301). A table 320, which is theencryption of table 300, also has, for example, one data column “CCC”numbered 322 and ten Rows (a column showing the identifiers of rows isnumbered 321). Each database cell value in table 320 is encrypted withits cell coordinates and therefore, the equal plaintext values in table300, for example the values “16” in cells 303, 304 and 305 are encryptedto the corresponding different ciphertext values in table 320.

Therefore, the SPDE method, according to a preferred embodiment of thepresent invention, has two immediate advantages. First, it eliminatessubstitution attacks attempting to switch encrypted values. Second,patterns matching attacks attempting to gather statistics based on thedatabase encrypted values would fail.

The SPDE system and method ensure that database tables and indexes canbe managed as usual by a DBA in their encrypted form, while keeping thedata secure. Furthermore, since the database structure remains the same,queries are not changed because of the encryption. This ensures thatexisting applications can use the encrypted database without the needfor any changes in the application software. The basic assumption behindthe SPDE method is the existence of an internal cell identifier which isbeyond the reach for an adversary and thus, is tamper proof. Most of thecommercial DBMS such as Oracle® and MS-SQL® generate row-ids for eachrecord. Row-id is a pointer to a database row defining the physicallocation of that row in the database. Thus, if changed, the row-id willno longer identify the same row. The existence of row-ids ensures thatSPDE method is applicable in commercial databases. The position of acell in the database is unique and can be identified using the tripletthat includes its Table ID (identification), Row ID, and Column ID. Thistriplet hereinafter is referred as the cell coordinates.

According to a preferred embodiment of the present invention, eachdatabase value is encrypted with its unique cell coordinates. Thesecoordinates are used in order to break the correlation betweenciphertext and plaintext values in an encrypted database.

Encryption/Decryption in SPDE System and Method

Let Define:

V_(trc)—A plaintext value located in table t, row r and column c.μ:(N×N×N)→N—a function generating a unique number based on the databasecoordinates.Enc_(k)—A function which encrypts a plaintext value with itscoordinates. The encryption of the plaintext value V_(trc), according toa preferred embodiment of the present invention, is defined by thefollowing equation:

Enc _(k)(V _(trc))=E _(k)(V _(trc)⊕μ(t,r,c))

where k is the encryption key, ⊕—is a XOR logical operator and E_(k) isa symmetric encryption function (e.g. DES, AES).

X_(trc)—A ciphertext value located in table t, row r and column c.

X_(trc)=Enc_(k)(V_(trc))

It should be noted, that in order to cope with the statistical attacksaccording to another preferred embodiment of the present invention, ahash function is activated on μ(t, r, c), and as a result a number basedon the cell identifiers: t, r and c is obtained. Then a XOR logicaloperation is performed between the plaintext value V_(trc), located inthe table t, row r and column c, and the above number obtained as theresult of the hash function activating on μ(t,r,c). Then the result ofthe XOR logical operation is encrypted by the symmetric encryptionfunction E_(k) obtaining Enc_(k)(V_(trc)).

The decryption of the ciphertext value X_(trc), according to a preferredembodiment of the present invention, is defined by the followingequation:

Dec _(k)(X _(trc))=D _(k)(X _(trc))⊕μ(t,r,c)=V _(trc)

where k is the decryption key, D_(k) is a symmetric decryption functionand Dec_(k) is a function which decrypts the ciphertext value (X_(trc))and discards its coordinates.

In order to decrypt the ciphertext value X_(trc), in case when the hashfunction was activated on μ(t,r,c) during the encryption, the decryptionof said ciphertext value X_(trc) comprises the following steps:

-   -   activating the symmetric decryption function D_(k) on said        ciphertext value X_(trc); and    -   performing the XOR logical operation between the result obtained        from said symmetric decryption function (D_(k)) activating and        the result obtained from the hash function activating on        μ(t,r,c).

Encryption ensures that a user not possessing the encryption key cannotmodify a ciphertext value and predict the change in the plaintext value.Usually the range of valid plaintext values is significantly smallerthan the whole range of possible plaintext values. Thus, the probabilitythat an unauthorized change to a ciphertext value would result in avalid plaintext value is negligible. Therefore, unauthorized changes tociphertext values are likely to be noticed at decryption time (thedecrypted value will be meaningless).

Substitution attacks as opposed to patterns matching attacks can not beprevented by simply using encryption. In the SPDE method, each value isencrypted with its unique cell coordinates. Therefore, trying to decrypta value with different cell coordinates (e.g. as a result of asubstitution attack) would probably result in an invalid plaintextvalue.

If the range of valid plaintext values is not significantly smaller thanthe whole possible range, or invalid plaintext values cannot bedistinguished from valid plaintext values, encryption has to be carriedout as follows:

Enc _(k)(V _(trc))=E _(k)(μ(t,r,c)∥V _(trc))

Since μ(t,r,c) is concatenated to the plaintext value before encryption,attempting to change the ciphertext value or trying to switch twociphertext values would result in a corrupted μ(t,r,c) after decryption.Obviously, concatenating μ(t,r,c) results in data expansion. It shouldbe noted, that in order to cope with the statistical attacks, accordingto another preferred embodiment of the present invention, a hashfunction is activated on μ(t,r,c), and as a result a number based on thecell identifiers: t, r and c is obtained. Then this number isconcatenated with V_(trc) and encrypted by the symmetric encryptionfunction E_(k) obtaining Enc_(k)(V_(trc)).

The decryption process for decrypting the encrypted value X_(trc)(X_(trc)=Enc_(k)(V_(trc))) in case when the encryption was performed byconcatenating the result of the hash function activating to theplaintext value before encryption, comprises the following steps:

-   -   activating the symmetric decryption function D_(k) on X_(trc)        and as a result obtaining a decrypted value D_(k)(X_(trc)); and    -   discarding said result of said hash function activating from        said decrypted value D_(k)(X_(trc)).

It should be noted, that the operation of discarding is an opposite tothe operation of concatenating.

FIG. 5 is a schematic illustration of a database and index encryption,according to a preferred embodiment of the present invention. Anexemplary table 501 identified by “T” is a conventional table in adatabase. Table 501 has, for example, one data column “D” numbered 503,and seven Rows (a column showing the identifiers of rows is numbered502). Suppose that a user wishes to encrypt data column “D” 503. Indextree before encryption 510 presents the index links which would becreated if column 503 of the table 501 should not be encrypted. Eachvalue of column “D” 503 is represented as a node in Index tree beforeencryption 510. For example, the value of “10” numbered 511 is a root oftree 510, positioning at the highest level of said tree 510. The root of“10” has two sons “15” and “5”, numbered “512” and “513”, respectively.The index pointers are divided into the two types of pointersillustrated by means of dashed and solid lines numbered 520 and 521,respectively. Solid lines 521 represent internal index pointers definingthe structure of index tree 510 (defining root “10” numbered 511 ofindex tree 510 and defining sons of each node, such as node 512 or 513of said index tree 510). Dashed lines 520 are external index pointers totable 501 rows, such as row identified by “0” or “1”. These externalindex pointers point the rows, wherein the corresponding value of eachnode of index tree 510 is located.

According to a preferred embodiment of the present invention, externalindex pointers represented by dashed lines 520 are concealed in ordernot enable the adversary to know the link between the values of nodes inindex tree 510 and the corresponding position of said values in table501. Since internal index pointers represented by solid lines 521 areimportant for performing various administrative operations, they remainnot concealed. Encrypted exemplary table 531 of table “T” 501 comprisesone data column “DD” numbered 533, and seven Rows (a column showing theidentifiers of rows is numbered 532). At column “DD” numbered 533 eachcorresponding value of table 501 is encrypted by means of the symmetricencryption function E_(k), such as DES, AES. k is the encryption key,⊕—is a XOR logical operator and μ(T,R,D) is a function generating aunique number based on the database coordinates, wherein “T” is a tableidentifier, “R” is a row identifier and “D” is a column identifier ofeach corresponding value in table 501. Each cell value is encrypted withits unique cell coordinates. For example, value “10” is positioned intable “T” 501, in row “0” and in column “D” numbered 503. Therefore, theposition of the value “10” is defined by (T, 0, D), as indicated in cell534. After the unique position of value “10” was identified, μ functionis activated on said unique position: μ(T, 0, D). As a result, μfunction generates a number (value) from the set of three numbers: “T”,“0” and “D”. Then, ⊕ (XOR) operation is performed between the value of“10” and the above number generated by μ function, as indicated in cell534: 10 ⊕ μ(T, 0, D). Then 10 CD μ(T, 0, D) is encrypted by means of thesymmetric encryption function E_(k), such as DES, AES, wherein k is theencryption key: E_(k)(10⊕μ(T, 0, D)), as indicated in cell 534.

Encrypted exemplary index table 540 comprises a data column “Data”numbered 543, structure column 542 and seven Rows (a column showing theidentifiers of rows is numbered 541). Index table 540 comprises theencrypted index of index tree 510 represented in a form of a table,since said index is stored in a database in this form. Structure column542 comprises values of internal index pointers represented by solidlines 521 in index tree 510. For example, in row identified by “0” thevalues that are indicated in structure column 542 are “1” and “2”. Thesevalues are related to rows “1” and “2” of table 540, said rows “1” and“2” comprise encrypted data relating to values “5” and “15” of indextree 510 (values “5” and “15” are indicated in the left part ofsymmetric encryption functions E_(k) (5∥1) and E_(k) (15∥3) numbered 546and 547, respectively). The encrypted data of row “0” is related tovalue “10” of index tree 510 (value “10” is indicated in the left partof symmetric encryption function E_(k) (10∥0)). Nodes having values of“5” and “15” are the sons of a node having the value of “10” and this isthe reason why rows identifiers “1” and “2” in table 540 (said rows “1”and “2” comprise encrypted values of data related to values “5” and“15”) are located in row

Since the internal index pointers in structure column 542 remain notconcealed, it is possible to perform various administrative operationson index tree 510 represented in the form of the table 540. In datacolumn 543 the value of each node of index tree 510 is concatenated tothe value of the corresponding external index pointer to table 501. Forexample, the value of “10” is concatenated to the value of the externalindex pointer to row identified by “0”, since “10” is located in table501 in row number “0”: 10∥0. Then, the result of concatenation of “10”and “0” is encrypted by means of the symmetric encryption functionE_(k), such as DES, AES, wherein k is the encryption key: E_(k) (10∥0),as indicated in cell 544. Therefore, as a result each index value isconcatenated with its unique row identifier. Although index tree 510 cancomprise equal values of a number of nodes, these values are encryptedto different values, since the corresponding different row identifiersare concatenated with each of said equal values.

The use of cell coordinates for the encryption of the database table andof row identifiers for the index entries, according to a preferredembodiment of the present invention, ensures that there is nocorrelation between the indexed values and the database ciphertextvalues.

Implementing a Secure μ Function

The implementation of μ affects the SPDE method and system ability toprotect against substitution and statistical attacks.

Substitution attacks—A secure implementation of μ would generatedifferent numbers for different coordinates in order to protect againstsubstitution attacks:

(t ₁ ,r ₁ ,c ₁)≠(t ₂ ,r ₂ ,c ₂)

μ(t ₁ ,r ₁ ,c ₁)≠μ(t ₂ ,r ₂ ,c ₂)

Unfortunately, generating a unique number for each database coordinatewould result in considerable data expansion. An alternativeimplementation reducing the data expansion might also result incollisions.

It is assumed that there are two cells, for which μ generates two equalvalues for their coordinates:

t ₁ ,r ₁ ,c ₁ ,t ₂ ,r ₂ ,c ₂|

[(t ₁ ,r ₁ ,c ₁)≠(t ₂ ,r ₂ ,c ₂)]

[μ(t ₁ ,r ₁ ,c ₁)=μ(t ₂ ,r ₂ ,c ₂)]

It is possible to substitute the ciphertext values of these cells (x_(t)₁ _(r) ₁ _(c) ₁ and x_(t) ₂ _(r) ₂ _(c) ₂ ) without μ being corrupted atdecryption time. If it is hard to find two cells such as those mentionedabove, this kind of attack can be prevented by using a hash function,for example MD5.

Statistical attacks—A secure implementation of μ generating differentnumbers for different coordinates would affect the ciphertext values sothat there would be no correlation between the plaintext and theciphertext value and thus, would protect against statistical attacks.However, statistical attacks can be performed on the encrypted valueseven if μ that generates different numbers for different coordinateswhen block cipher techniques such as CBC (content block chaining) areused. In the SPDE method, the unique cell identifiers size might belarger than the size of one block. It is assumed that a block ciphersuch as CBC is used as the encryption function and a specificimplementation of μ concatenating the coordinates of a cell in order tocreate a unique representation of its location is used, as follows:

μ(t,r,c)=t∥r∥c

For example, if t=324, r=451 and c=372, then μ(t,r,c)=t∥r∥c=324451372.

The combination of block ciphers with the above implementation of μcause information leakage which could be used for statistical attacks.For example, for values located at cells at the same table, the samecolumn and subsequent rows, the unique values created by μ will only bedifferent at the least significant bit. If the number of bytes used inorder to represent μ is larger than the block size in the block cipherused, the first blocks of the two encrypted values will be equal if andonly if the first blocks in the plaintext values are equal. The firstblocks will probably hold most or all the sensitive data and thus,statistical attacks are possible by examining the equality of the firstblocks of the ciphertext values.

In order to cope with statistical attacks, a secure μ function has toproduce values with low probability of collision in said μ functionfirst block. According to a preferred embodiment of the presentinvention, this goal is achieved by activating a hash function onμ(t,r,c), said hash function generating a hash value from the cellcoordinates that always affects the first block in the block cipher.This value is used in order to change the first block of the plaintextvalue before encryption. Since a collision-free hash function is used,even cell coordinates have equal first blocks and therefore, only aslight change in the least significant bits produces different firstblocks using the hash function. Thus the statistical attacks are ruledout, since if the first blocks in CBC mode are different, then the wholeencrypted cell is different.

Revocation

Since cell coordinates only relate to the physical location of saiddatabase cell in the SPDE method and system, according to a preferredembodiment of the present invention, substitution attacks thatsubstitute a database cell with one of its previous versions wouldsucceed. What is needed is to add another dimension, that of time, toeach cell. If database cells were encrypted with another dimension, thevalidity of the version of that encrypted value can be verified just asit was verified that the value is in its correct logical location.

In order to illustrate the need of the additional dimension, a possibleattack scenario is described. It is assumed that a databaseadministrator applies the above attack to his account balance just afterwithdrawing $10,000. Since the account balance values before and afterthe withdrawal are valid database encrypted cells, both located in thesame database coordinates and encrypted with the same key, no one coulddetect the attack performed by the DBA since all values are valid(encrypted with the right key and using the correct cell coordinates).

Three ways to cope with this attack are suggested:

-   -   1. In the Oracle database, a special pseudo-column is used to        represent the version of each row within each table. Using this        as a representation of the version of a cell would result in the        need to re-encrypt the whole row after a particular cell of that        row was changed, thus, the structure of the database would        change.    -   2. If the update operation, as two subsequent delete and insert        operations, is selected, then the inserted row will be assigned        to a different row-id and thus the updated value will be        assigned to different cell coordinates. The above attack would        then be eliminated. In this approach, the whole row is affected        after a cell is updated and again the structure of the database        would change. However, this representation can be satisfactory        when applying the model on bi-temporal databases where there are        no updates, only logical deletes that can be referred to as        updates on the whole row being deleted. What is needed is a        representation of a version at the level of cells that can be        used together with the other cell coordinates in order to create        a complete representation of time (version) and place (logical        database location) of each database cell.    -   3. A unique value can be added to each newly inserted cell that        uniquely identifies the value among all the created values        before encryption. A database sequence can be used in order to        create such values. When a value is updated, its previous unique        value is added to a revocation list including all values updated        but are now revoked. When a database query is executed the        unique value of the current cell is extracted and checked        against the revoked list to check if the value has been revoked.        If not, the value is returned to the user. Obviously, using this        approach adds high overheads for databases with frequent update        operations.

A Proposed Encrypted Indexing Method for Supporting the SPDE Method

The SPDE method suggests how to construct a secure index on theencrypted database, so that the time complexity of all queries ismaintained. Furthermore, since the database structure remains the sameno changes are imposed on the queries.

A secure database index, encrypted at the single values level ofgranularity is suggested. Best performance and structure perseveranceare simply obtained, since single values granularity encryption is used.Information leakage and unauthorized modifications are protected againstusing encryption, dummy values and pooling. In addition, a techniquethat supports discretionary access control in a multi-user environmentis presented.

Index Encryption

Let assume that a conventional (standard) index entry is of the form:

(V_(trc), IRs, ER), where:

V_(trc)—An indexed value in table t, row r and column c.IRs—The internal pointer (reference) (pointers between index entries)ER—The external pointer (reference) (pointers to the database row).

An entry in the secure index, according to a preferred embodiment ofpresent invention, is defined as follows:

(E _(k)(V _(trc)),IRs,E _(k)′(ER),MAC _(k)(V _(trc) ∥IRs∥ER∥SR)), where:

k—An encryption key.E_(k)—A nondeterministic encryption function.E_(k)′—A conventional encryption function.SR—The entry self-pointer (reference), which determines the position ofthe corresponding node in the index. SR is used as a node identifier ofthe corresponding index.MAC_(k)—A message authentication code function.

The implementation of E_(k) introduces a tradeoff between static leakageand performance. If E_(k) is a non-deterministic encryption function(that is, equal plaintext values are encrypted to different ciphertextvalues), statistics such as the frequencies and distribution of valuesare concealed, but comparing index values requires their decryption. Onthe other hand, if E_(k) is an Order Preserving encryption function,some information about the index values is revealed (e.g., their order),but it is possible to compare values without the need to decrypt them.If E_(k) is an Equality Preserving encryption function, then equalplaintext values are encrypted to equal ciphertext values.

This tradeoff between Security and Performance for E_(k) implementationis shown in Table 2.

TABLE 2 The Tradeoff between Security and Performance for E_(k)implementation. Security Performance Nondeterministic High WorstEquality Preserving Medium Low Order Preserving Low Medium No EncryptionWorst High

It is suggested to use a non-deterministic E_(k) encryption function. Apossible implementation of E_(k) is:

E _(k)(X)=E″ _(k)(x∥r), where:

k—An encryption key.E″_(k)—A conventional encryption function.r—A random number with a fixed number of bits.

Using the above implementation of E_(k) there is no correlation betweenE_(k)(V_(trc)) and the corresponding column ciphertext value (randomnumbers are used before encryption) and thus linkage leakage attacks areeliminated.

Most commercial databases implement indexes like tables (as heap files).In this implementation, index entries are uniquely identified using thepair: page id defined hereinafter as SR and slot number definedhereinafter as IR.

Message authentication codes (MAC) are used to protect againstunauthorized modifications of messages. They mix the messagecryptographically under a secret key, and the result is appended to themessage. The receiver can then recompute the MAC and verify itscorrectness. It should be impossible for an attacker to forge a messageand still be able to compute the correct MAC without knowing the secretkey.

According to a preferred embodiment of the present invention, a MAC_(K)function is used in order to protect the index entries againstunauthorized modifications.

Spoofing attacks are eliminated, since the MAC value depends on V_(trc),and once E_(k)(V_(trc)) is tampered with, V_(trc) will not match theV_(trc) used in the MAC.

Splicing attacks are eliminated since the MAC value depends on SR andtrying to substitute two encrypted index entries will be detected, sinceSR would not match the SR used in the MAC.

Replay attacks can be eliminated by adding a time dimension to eachindex node. This enables the validity of the node version to beverified, just as ER was used in order to verify its logical location.

The MAC value added to each index entry causes data expansion and thus,its size introduces a tradeoff between security and data expansion.

The following pseudo-code, according to a preferred embodiment of thepresent invention, illustrates a query evaluation using the encryptedindex, which is assumed to be implemented as a binary tree. However, thepseudo-code can be easily generated to handle a B-Tree implementation,according to another preferred embodiment of the present invention.

INPUT:  A table: T  A column: C  A value: V  A query: SELECT * FROM TWHERE T.C>=V OUTPUT:  A collection of row-ids. X := getIndex(T,C).getRootNode( ); While (not X.isLeaf( )) Do  If (not x.isValid( ))  Throw IllegalStateException( );  Else   If X.getValue( )<V Then     X:= X.getRightSonNode( );   Else     X := X.getLeftSonNode( );   End If; End If; End While; RESULT := { }; While X.getValue( )<V Do  X :=X.getRightSiblingNode( ); End While; While X is not null Do  RESULT :=RESULT union {X.getRowId( )};  X := X.getRightSiblingNode( ); End While;Return RESULT;

While isLeaf; getRightSonNode, getLeftSonNode and getRightSiblingNodefunctions relate to the index structure and their implementation doesnot change, getValue and getRowId functions are implemented differentlyso that encryption and decryption support is added. The function isValid verifies the index entry integrity using the MAC value.

Performance can be furthermore improved, if entries verification isperformed periodically on the entire index and not as part of each indexoperation.

Using Dummy Values and Pooling

In order to cope with dynamic leakage attacks, it is needed to reducethe level of confidence an adversary has about the effect of newinserted data on the database indexes. There is a tradeoff between howmuch of the index is updated and how much information an adversary isable to learn.

According to a preferred embodiment of the present invention, twotechniques for reducing the adversary level of confidence are proposed:

-   -   a. Dummy values; and    -   b. Pooling.

Dummy values can be inserted to the index with each insertion made bythe user, and thus reduce the level of confidence. However, insertingdummy values with each insertion results in data expansion. The numberof dummy values added in each insertion determines the level ofconfidence, which an adversary has about the position of a value withinthe index.

The meaning of pooling is to collect coming elements to a temporary datastructure (the pool), and in a given time, the whole data structure isemptied and its elements are inserted to the original data structurethey were originally meant to be inserted to.

FIG. 6A and FIG. 6B are schematic illustrations of database indexingusing pooling, according to a preferred embodiment of the presentinvention. It is suggested to use pooling for security reasons. A fixedsize pool 601 is defined for each index 603, said pool holding the newinserted values. Only when pool 601 is full, index 603 is updated withthese values. Furthermore, the extraction of values from pool 601 toindex 603 should be done in a random order, since it makes it difficultto link the extracted values and their corresponding inserted values.When a query is to be executed, it is first needed to search the pool601, and then to search the rest of the index. The pool size determinesthe level of confidence, which an adversary has about the position of avalue within index 603. A full scan has to be performed on pool 601whenever index 603 is used. Thus, the size of pool 601 is aprivacy-performance tradeoff. Using a pool size that has spacecomplexity of O(log|table size|) will not affect the time complexity ofthe queries.

Using pool 601, the adversary cannot link an inserted database value toits corresponding index value—the only thing he can do is to link agroup of inserted database values and a group of inserted index values.The adversary cannot link a single database value to its correspondingindex value. The size of the pool (or the size of the group) determinesthe level of confidence an adversary has about the position of a valuewithin the index.

FIG. 6A illustrates the database table 602, index 603 and pool 601 afterthe insertion of, for example, three values: 17, 5, 24 where the poolsize is four values. FIG. 6B illustrates the database table 652, index653 and pool 601 after the insertion, for example, of a fourth value:36, that fills the pool. After the insertions of the first three values,index 603 is not updated, all the values are added to pool 601 only.After the insertion of the fourth value—36, pool 601 is emptied, and allof its values are added to index 603 generating a new index 653. Itmeans that the adversary has a probability of ¼ (0.25) to link adatabase value (one of the four inserted values) with its correspondingindex value.

If the values are extracted from pool 601 in the same order that theywere inserted, then the adversary can still link the database value withits corresponding index value (the first database value with the firstvalue that is extracted from the pool, the second with the second and soon). Therefore, in order to solve this problem, according to a preferredembodiment of the present invention, the values are extracted from thepool in a random order.

When a query is to be executed, first it is needed to search pool 601,and then to search the rest of the index. A full scan has to beperformed on pool 601 whenever the index is used. Thus, the size of pool601 is a privacy-performance tradeoff. Using a larger pool means a lowerlevel of confidence for the adversary, however it requires more time.Using a pool size that has space complexity of O(log|table size|) doesnot affect the time complexity of the queries, since searching index 603or 653 is of the same space complexity.

Pool 601 should be kept in a secure memory location in the server, sothat the adversary is not able to observe dynamic changes in the poolitself. Such secure places can easily be achieved using dedicatedhardware.

Supporting DAC in Indexes

If indexes are used only by one user or if they are never updated, it ispossible to maintain a local index for each user. Securing indexesstored locally is relatively easy. However, such local indexes do notwork well in a multi-user environment, since synchronizing them isdifficult. Thus, it is necessary to store the indexes in one site, suchas the database server, and share them between users. A fundamentalproblem arises when multiple users share the same encrypted index andeach user has different access rights.

According to a preferred embodiment of the present invention, a solutionto this problem is suggested: splitting the index into severalsub-indexes where each sub-index relates to values in the columnencrypted using the same key.

FIG. 7 illustrates the use of sub-indexes, according to a preferredembodiment of the present invention. Different shades of colors of cellsin a column 701 of a table mean different security groups—cells whichare encrypted using different keys. After splitting the index intosub-indexes A, B and C numbered 711, 712 and 713 respectively, eachsub-index is related to values in column 701 encrypted using the sameencryption key, and each value of column 701 is referenced only by onesub-index, such as the sub-index A, B or C. In order to evaluate a query720, only ciphertext values with the same access right are queried. Allthe values in each sub-index belong to the same security group (and thusencrypted using the same key), and thus the problem of accessing theentire index or the indexed elements by users who belong to differentsecurity groups is eliminated. Otherwise, the users who belong todifferent security groups could not access the entire index or theindexed values, since said entire index or the indexed values would bebeyond their access right. When a value is inserted, it is inserted tothe sub-index with the appropriate security group only. If this securitygroup does not exist, a new sub-index is created.

When creating an index for column 701, the column is being marked asindexed but nothing is really created, since the encryption keys aremissing. When a user queries column 701 for the first time or executes adedicated command, the sub-indexes for his security groups are beingcreated (if not exist already).

In order to create the sub-index, such as the sub-index A, B or C, it isneeded to know which of the values of column 701 belong to the specificsecurity group. According to a preferred embodiment of the presentinvention, this can be done in several ways:

-   -   a. “Brute force”—trying to decrypt each of the column values. If        succeeded to decrypt the above each value, then it belong to the        specific security group, otherwise not.    -   b. “Forced Sub Indexes”—Supposing that each encrypted column,        such as column 701 is indexed and thus when inserting a value to        the database it is immediately inserted to the corresponding        sub-index, such as the sub-index A, B or C.    -   c. “Explicit”—Each encrypted value is related to the        corresponding security group, or for each security group a list        of all its encrypted values is kept. Therefore, it is known what        is needed to be added to the corresponding sub-index, such as        the sub-index A, B or C.

FIG. 8 illustrates how a query is executed using sub-indexes, accordingto a preferred embodiment of the present invention. First, client 202connects to database server 203 and identifies himself, for example byusing a smart card, such as a CompactFlash® card. After client 202 hasbeen identified, a secure session between client 202 and database server203 is created at step 801. In this secure session everything that istransmitted between client 202 and database server 203 is encrypted andsecured, for example by using SSL. The client transfers his one or moreencryption keys to database server 203 at step 802. The keys representthe security groups of client 202. The encryption keys can be suppliedby means of the smart card. The encryption keys are revealed to databaseserver 203 during the whole session. At step 803, during the securesession, client 202 submits queries to database server 203. At step 804,database server 203 is locating the sub-indexes 811, 812 and 813 whichclient 202 is entitled to access. This can be done if database server203 maintains a directory that maps a security group to thecorresponding sub-index. The security groups that database server 203keeps are not the encryption keys themselves, since they are revealed.In order to determine the encryption keys, a simple calculation can bedone by using a hash function (The security group is the hash value ofan encryption key). At step 805, the query is executed on thecorresponding located indexes 811, 812 and 813. The result of client's202 query is transferred to said client 202 at step 806.

Analysis of the SPDE System and Method Properties

The proposed SPDE database encryption system and method, according to apreferred embodiment of the present invention, satisfies most of thedesired properties of a database encryption method mentioned in the“Background” section:

-   -   1. Security—The security of the proposed SPDE database        encryption system and method, according to a preferred        embodiment of the present invention, relies on the security of        the encryption algorithm used. In order to reveal some database        value it has to be decrypted using the correct key. Thus, by        employing strong encryption algorithms such as the AES while        using a key size of 128 bit, the encryption method is        computationally secure.    -   2. Performance—Encryption and decryption are fast operations and        mandatory in any database encryption method. The proposed        implementation adds the overhead of a XOR operation and μ        computation which are negligible compared to encryption.        Furthermore, the overhead of the proposed SPDE database        encryption system and method, according to a preferred        embodiment of the present invention, only adds a constant to the        overall time complexity of the database operations.    -   3. Data Volume—Using encryption algorithms such as DES or AES        which are block ciphers results in data expansion (in many cases        this expansion is negligible) since the size of the cipher text        is the multiplication of the block size. However, even when        block ciphers are used, the database expansion caused by the new        method is a constant and has no effect on the database size        complexity.    -   4. Decryption Granularity—The basic element of reference is a        database cell. Operations on a cell do not depend or have any        effect on other cells.    -   5. Encrypting different columns under different keys—The        proposed SPDE database encryption system and method, according        to a preferred embodiment of the present invention, facilitates        subschema implementation. Since each cell is encrypted        separately, each column can be encrypted under a different key.        Moreover, implementations needing row level access control can        also be applied since each cell can be encrypted using a        different key.    -   6. Resistance to patterns matching and substitution attacks—The        proposed SPDE database encryption system and method, according        to a preferred embodiment of the present invention, prevents        patterns matching attacks since there is no correlation between        a plaintext value and a ciphertext value (achieved by using        encryption) and there is no correlation between various        ciphertext values (achieved by using μ before encryption). Two        equal plaintext values will be encrypted to two different        ciphertext values since the database encryption method encrypts        the values with their unique position. Substitution attacks are        also prevented.    -   7. Unauthorized access detection—Unauthorized manipulation on        the encrypted data without the encryption key would be noticed        at decryption time.    -   8. Maintaining DB structure—The SPDE database encryption system        and method, according to a preferred embodiment of the present        invention, complies with the structure preserving requirements.        Since the basic element of reference is a database cell, no        changes are needed to the database internal files. Moreover,        since the DBMS has access to all the encryption keys during the        session, values are decrypted, as required, allowing the        internal algorithms and the user interface commands to remain        without any change.

Implementing the SPDE Method in Commercial DBMSs

In the following subsections are disclosed the issues that have to beaddressed while implementing the SPDE method and system, according to apreferred embodiment of the present invention, in a commercial DBMS,such as Oracle 9i®.

Oracle's® Object Types

Oracle® implements Objects similarly to packages. An instance of anobject type can be stored in the database for later use as any otherdata type. The instance of an object is defined by the values of itselements with its member functions defined in the type body. Objecttypes also have constructors implementing the instantiation of an objectwhen first created. SQL queries performed on the object types evaluatethe relation between two objects using a special member function whichneeds to be implemented. Once the order function is implemented, all SQLqueries performed on the objects execute naturally without any need ofquery translation.

Implementing the SPDE Method

Using the Oracle's® object types, the encryption and decryptionoperations of the SPDE system and method, according to a preferredembodiment of the present invention, have been implemented. The newobjects encapsulate the whole encryption process while the decryptionprocess is transparent to the user executing regular SQL statements.

The Secure Object is defined, for example, as follows:

CREATE OR REPLACE TYPE SecureObject AS OBJECT (  objectId NUMBER, ciphertextValue RAW(1024) ,  actualSize NUMBER, CONSTRUCTOR  FUNCTION  SecureObject (plaintextValue UserDataType) RETURN SELF AS RESULT,  ORDER MEMBER FUNCTION match (secSecureObject) RETURN INTEGER );

Where objectId is used for the decryption process, ciphertextValue isthe encrypted value of the object, actualSize holds the actual size ofthe value before padding, that is used during the decryption process inorder to discard the pad. The data type of the encrypted object that isdefined in the constructor is selected according to the column typebeing encrypted.

The defined exemplary Secure Object is an entity gathering allinscription and decryption operations of the SPDE system and method,according to a preferred embodiment of the present invention. A usergenerating a query from the database, encrypted by means of the SPDEmethod, according to a preferred embodiment of the present invention,generates the same query as he would generate if said database would bea conventional database, which is not encrypted by means of said SPDEmethod.

The defined exemplary Secure Object comprises three variables—objectId,ciphertextValue and actualSize and two functions—SecureObject and match.The variable objectId is used for decryption operations. Since it isneeded to know the position of an object in order to decrypt its value,and the position of said object is not revealed during obtaining theresults to the user's query, then each object is assigned with a specialidentifier. After obtaining each object from the encrypted database, itis possible to determine said object position by means of the abovespecial identifier. It is performed transparently to the user by meansof the match function. ciphertextValue variable keeps the encoded stringin the binary form. actualSize variable keeps the size of the originalstring before encryption. The operation of storing the original stringsize is performed transparently to the user by means of SecureObjectfunction. match function is called by means of the database each timethere is a need to perform a comparison between two encrypted objects.Database performs a call to the match function transparently to theuser. The user performs a conventional SQL command and is not aware thatthe database uses match function in order to evaluate his query. matchfunction obtains objectId of an object to be compared and obtains theposition of said object by means of said object specialidentifier—objectId. Then match function decrypts the value which isstored in the string ciphertextValue by means of the encryption keysreceived from the user during the session. After the decryption of theabove value, the position of said decrypted value is used in order toobtain the original string before encryption. SecureObject function iscalled by means of the database in order to encrypt the values insertedin said database by the user. SecureObject function stores the originalsize of the original value before encryption by means of the variableactualSize, accesses the database and obtains the next position at saiddatabase to where the new value will be inserted. SecureObject functionencrypts the value together with the position to where said value willbe inserted by means of the encryption key received from the user duringthe session. SecureObject function stores the encrypted string inciphertextValue variable and the object is stored in the database.

In order to encrypt a column of one of the database tables, the columntype has to be defined as “secure object type”. Moreover, instead of theinsert statement “insert (‘1’, . . . )”, the user will have to performthe following statement “insert (SecureObject(‘1’), . . . )” indicatingthat the new inserted value is “secure object type”. The Oracle's®object types are used in order to encapsulate the whole encryptionprocess during insertion.

The constructor of the object SecureObject initiates the new object asfollows:

INPUT: Plaintext Value.

OUTPUT: Object Encrypted According to the SPDE Method.

1. The new object is assigned a unique identifier by the DBMS.

2. The cell coordinates of the new object are retrieved from thedatabase.

3. The μ function for these coordinates is computed.

4. The object's plaintext value is encrypted with μ as described insection.

5. The created object is stored in the database.

Updates are performed as with insertions, the only difference being:updates use the original cell coordinates of the updated cell during theencryption but delete operations remain without any specialmodification.

In order to perform a query, the predefined interface that Oracle'sobjects supply for comparison between two objects is used. An orderfunction for the secure objects is defined as the relation between theirdecrypted values. After defining the order between two database objectsall queries can be executed without any changes to the queries operatingon the encrypted database.

The order function is defined as follows:

INPUT: Two Encrypted Objects.

OUTPUT: The Order Between The two Objects {‘<’, ‘=’, ‘>’}.

1. The cell coordinates of both compared values are retrieved.

2. The μ function is computed for each of the compared values.

3. Both values are decrypted using their μ values found in step 2 above.

4. The order between the two objects is defined as the order between theplaintext values found in step 3 above.

Implementing Encryption

An important issue is when to perform the encryption. If an object isupdated, it can be encrypted with the row-id of the row about to beupdated before it is stored in the database using before updatetriggers. However, assuming that a new object is about to be inserted,it has no row-id since the row has not yet been inserted, and a way toretrieve the next row-id of the table, in which the object is about tobe inserted, has to be found.

The difficulty in obtaining the next row-id of the table can be overcomeby using a pseudo-code as follows:

INPUT: Table name. OUTPUT: The next row-id of that table. StartAutonomous Transaction Insert into <table name> values(dummy_value,...); nextRowId := Dbms_sql.get_last_rowid( ); rollback;End Autonomous Transaction Return nextRowId;

The above pseudo-code uses a mechanism called autonomous transaction.Declaring a code block as autonomous transaction guarantees that all DMLoperations performed within this block can be committed (or rollbacked)without influencing the main transaction that called for the autonomoustransaction in the first place. In the above pseudo-code a dummy valueis inserted into the table. Then the row-id of the inserted row can beobtained by means of Dbms_sql.get_last_rowid function, which determinesthe row identifier (row-id) to where the dummy value was inserted. Sinceit is an autonomous transaction block, the insertion can be rollbackedwithout affecting any other transactions (mainly the transaction aboutto insert an object into the table and calling for its row-id). Afterthe execution of this procedure, the row-id of the next row (thevariable nextRowId) of that table is obtained and is transferred to afunction that called for the autonomous transaction. Now a way to usethis function in the “before insert trigger” is need to be found.However, since a dummy value has been inserted into the same table, itwill again fire the trigger. In order to overcome this problem, somespecial value should be used when inserting the dummy value that willinform the trigger not to call the function.

Here the use of objects again becomes useful. All objects haveconstructors that are used in order to instantiate them. If the abovefunction is called from inside the object constructor, the wholeencrypting process is encapsulated within the object.

Two assumptions are made when using the above code:

-   -   a. First, that there will be only one insertion in a time. If        some value (let call it the second value) was physically        inserted to the same table before another value (let call it the        first value) was physically inserted, but after the first value        called the above procedure, then there are two values encrypted        with the same row-id, and one of which is wrong. In the Oracle®        database, a transaction that inserts a record into a table has a        lock on the table to ensure that this kind of scenario is        impossible.    -   b. The second assumption is that the row-id of the dummy value        would be the same as of the real value. However, this assumption        is not always possible, since a row with 16 bytes of data can be        inserted to a different row-id, for example of 64 bytes of data,        depending on the database fragmentation. Thus, in order for the        above second assumption to be valid, the dummy value needs to be        of the same size as the real value.

Implementing Decryption

In order to retrieve the plaintext value of some cell there is a need toretrieve the cell's coordinates. The row-id of the object cannot bereferred, since it is not part of the table and there is no attributewhich it can access in order to obtain its current row-id. If there weresuch an attribute, it would simplify the decryption process. However, ifa unique sequence number for each object created as one of the objectattributes is kept, it can be used in order to retrieve the currentobject.

The following pseudo-code illustrates a decryption procedure which canbe implemented as a member function of the encrypted object in order toretrieve the object's row-id using its object-id:

INPUT: Table name, Column name. OUTPUT: The Decrypted Value. Selectrowid into currentRowId from <Table name> Where <Columnname>.getObjectId = SELF.objectId; (SELF is a reference to the objectthat is used to access the particular instance of the object from thescope of its member functions) Return currentRowId;

It is needed to know the row-id (row identifier) of the value which hasto be decrypted. The object is obtained from the table and is identifiedby means of the objectId variable. During the object obtaining process,the row-id (rowid variable) of the object is also obtained. At the endof the above pseudo-code the row-id of the object—currentRowId variable(which is equal to rowid variable) is transferred to a function (thatcalled the above decryption procedure) for decryption.

Object-ids of the encrypted objects are not encrypted or secured in anyway, since the only use of these values is in retrieving thecorresponding row-id for a particular object. If object-ids aresubstituted or corrupted, it will still be possible to retrieve thecorrect row-id from the object-ids, since the actual value of theobject-id is only used in order to find the object during decryption.One limitation regarding object-ids is that they have to be unique. Thiscan be enforced using a unique constraint on object-ids values.

If an index on these object identifiers is built, the only overheadbesides decryption is the overhead of another unique index scan for eachvalue decrypted. However, the index has to ensure that changing thereference of the index to the database row is impossible.

In order that SQL queries perform naturally within the database withoutchanging the database queries, the order member functions of the Oracle®database object types are used, and the relation between two objects, asthe relation between their plaintext values, is declared. This enabledthe use of order, group, join and select operations without the need tochange the database queries. Furthermore, if a data integrity checkneeds to be performed (unique constraints, foreign key constraints,etc.) it would be performed after the DML operation without any specialarrangements. The whole process of evaluating the order between twoencrypted values for any use is concealed by the objects.

Comparing the Encrypted Values to Plaintext Values

After encrypting the database values, each database encrypted cell isrepresented by an object. When performing a query, this object is usedin order to compare the object (encrypted cell) to other databaseobjects (encrypted cells). Now, it is assumed that the user asks for allvalues equal to a given plaintext value (e.g., the number ‘5’ or thestring “abc”). If the object's order function is used, then a newencrypted object has to be created from the user's given plaintextvalue. However, the new object will be encrypted using the next databaserow-id. When the object's order function attempts to compare objects inthe database with the new object in order to answer the user's query, itwould try decrypting the database value using its cell coordinates.Since the encrypted value is not in a database table, there are no cellcoordinates which can be used, and the row-id with which the value wasencrypted can not be reconstructed, as there might have been newinsertions changing the “next row-id” value from the time the query wasfirst executed and the value encrypted. Thus, creating a new object inorder to answer a user's query is not effective in this case. A castoperation is needed that would create a new secured object withoutencrypting the object with its cell coordinates. This new object shouldbe marked as not encrypted, so that when the order function compares itto other objects, it will not be decrypted. Using a cast functionreturning an object ensures that comparing the values in the database toplaintext values is encapsulated by the object. However, if it werepossible to implement user defined order function between objects andother data types in Oracle®, the use of the cast operation could beavoided.

Stable Cell Coordinates

The proposed method assumes that cell coordinates are stable. Thus, DML(data manipulation language) operations such as insert, update anddelete do not modify the coordinates of existing cells. If for example,after deleting a row from a table, some cell coordinates change, thenall cells encrypted using these cell coordinates will be corrupted afterdecryption. In the Oracle 9i® DBMS, cell coordinates are stable, thus,DML operations do not change the cell coordinates of any other cells.This property also ensures that DML operations do not impose thereconstruction of existing database indexes, since indexes use row-idsas pointers to the database indexed records.

A database reorganization process may change cell coordinates. Forexample, IMPORT and EXPORT operations are used in order to transfer thedatabase content to a flat file and from there to some other (possiblythe same) database. If the data is exported by a user having theencryption key, then the database content may be exported as plaintext,and its content may be encrypted during the import process with thenewly allocated cell coordinates. If the data is exported by a user notpossessing the encryption key, for example the DBA, data is exportedexactly as retained in the database. During the export the cellcoordinates are attached to each encrypted cell. When importing thedata, the encryption keys are required since the value had to bedecrypted. The decryption process uses the corresponding cellcoordinates for each value attached to it during the export in order toobtain the plaintext value of each cell. After the plaintext values areobtained they are encrypted with the new cell coordinates in thedatabase into which the values are imported.

Transforming a Regular Database to an Encrypted Database

In order to transform a regular database to an encrypted database usingthe SPDE database encryption system and method, according to a preferredembodiment of the present invention, a parallel database method with allthe regular database tables are recreated where each type is used in theregular database table as a column type, a secure object of the sametype is to be created and the column is declared to be of that objecttype. All the constraints and foreign-keys are to be copied as is.Triggers or packages comparing plaintext values to values in thedatabase need to be changed so that a cast operation is performed on theplaintext values. Indexes on the encrypted tables need to be created,since regular indexes, if created, would expose the order of the indexedvalues. All queries remain the same, thus the changes do not affect thedatabase software.

Evaluation Environment

The SPDE method and system, according to the present invention, wereimplemented and evaluated in Oracle® 9i DBMS environment. The standardobfuscation toolkit that comes with the Oracle® database was used inorder to perform DES encryption. The SPDE method and system, accordingto the present invention, were implemented using the Object Type thatwas implemented in the Oracle® 9i database. During the evaluation atable with one column, that contained data payload of 128 bytes storedin an Oracle object type, was used.

Evaluation Goal—

A goal in the following evaluations is to measure the constant that theimplementation of the SPDE method and system, according to the presentinvention, add compared to a two testing methods and systems: method andsystem that apply encryption without cell coordinates and method andsystem without encryption.

Evaluation Parameter—

The parameter that is measured in order to evaluate SPDE method andsystem, according to the present invention, is the CPU time, since mostof the overhead of SPDE scheme is attributed to CPU time.

Evaluation Plan—

In order to evaluate the encryption and decryption operations, it waschosen to evaluate two main database operations: insertion andselections. Each insertion or selection in the SPDE method and system,according to the present invention, consists of three main operations:insertion (or selection of an object), retrieval of the object cellcoordinates and encryption (or decryption). The CPU time in each ofthese three cases is measured by building a different system for eachcase. The first system is the SPDE system, according to the presentinvention. The second system encrypts the object as in the SPDE system,according to the present invention, but without retrieving its row-ids.This system is referred as NDE system (Naive Database Encryption). Thethird system only stores the value as a plaintext value in an object.The third system is referred as OWE system (Object Without Encryption).

Experiment No. 1 Insertions

The CPU time of n subsequent insertions is measured using the SPDEsystem, according to the present invention. Also are measured nsubsequent insertions using the NDE system and n subsequent insertionsusing the OWE system into a truncated (empty) table. The value of n wasselected between 5 and 50. The overhead of the SPDE system, according tothe present invention, is constant and the goal is to find thisconstant, added by said SPDE system implementation.

The results received from measuring the CPU time in the SPDE system,according to the present invention, are compared to the CPU time NDE andOWE systems.

It was found that in the implementation of the SPDE system, according toa preferred embodiment of the present invention, that the constantoverhead is 12.62 factor between OWE (Object Without Encryption) andSPDE systems and 4.99 factor between NDE (Naive Database Encryption) andSPDE systems in case of insertions. The factor received for the SPDEsystem, according to a preferred embodiment of the present invention,compared to the NDE system caused by the operation of retrieving therow-ids, since rollback and insertions are CPU expensive operations.This overhead could be avoided if Oracle® supplied an efficient way toretrieve the next row-id of a value about to be inserted that could beused instead of the mechanism.

Experiment No. 2 Queries

In order to evaluate the overhead of selections using the SPDE system,according to the present invention, compared to the NDE and OWE systems,a query is performed using each of those systems on a table with nrecords, where n is between 5 and 50. Each of the queries performed afull table scan on the encrypted table, since no index was defined onthe table. However, the constant value that was received in thisexperiment represents the constant overhead of the decryption operationwhen queries are used.

The results received from measuring the CPU time in the SPDE system,according to the present invention, are compared to the CPU time NDE andOWE systems.

It was found that in this implementation of the SPDE system, accordingto the present invention, the constant overhead is 15.86 factor betweenOWE and SPDE systems and 1.11 between NDE and SPDE systems in case ofselections. The factor received between OWE and SPDE systems simplycaused by the decryption process. The degradation in performance betweenSPDE and NDE systems is caused by the process of fetching the cellcoordinates of the object. If the retrieval of cell coordinates wassupported by Oracle®, the 11% overhead of fetching the cell coordinatehad been avoided.

Experiment Analysis

The experiments above show that the SPDE method and system, according tothe present invention, only adds a constant factor to insertions andqueries. The constant factors measured during the evaluation can befurther reduced if a dedicated hardware for encryption is employed orOracle® supported efficient retrieval of cell coordinates.

Most commercial databases perform caching to values that were recentlyaccessed. However, the values are kept in the cache the same way as theyare kept in the database. For regular databases it makes no differencebut when applying database encryption, better performance can beachieved if values are kept decrypted in memory and thus avoid somedecryption operations.

While some embodiments of the invention have been described by way ofillustration, it will be apparent that the invention can be put intopractice with many modifications, variations and adaptations, and withthe use of numerous equivalents or alternative solutions that are withinthe scope of persons skilled in the art, without departing from thespirit of the invention or exceeding the scope of the claims.

1. A Structure Preserving Database Encryption system for encrypting acontent stored in cells of a database, comprising: a. a computerprovided with a client software having access right definition to datastored in said database, wherein said client is used for communicatingwith said database by generating a communication session, and forallowing a person operating said client to retrieve data from saiddatabase; b. a computerized authentication server for identifying saidclient and for transferring one or more encryption keys to said client;and c. a computerized database server for encrypting data stored in eachcell of a table within said database and for communicating with saidclient via said generated session, thereby providing said clientaccording its access right definition decrypt data, wherein a valuestored in a corresponding cell is determined such that the content ofeach cell in the database before the encryption includes a plaintextvalue, while after the encryption the content of each cell in saiddatabase includes a ciphertext value, and each of said cells within saiddatabase has a unique cell coordinate represented by table, row andcolumn identifiers, wherein a concatenation function is activated onsaid cell table, row and column identifiers and as a result, a numberbased on said identifiers is obtained, and wherein a XOR operationbetween said number and said value stored in said cell is operated or aconcatenation of said number with said value stored in said cell isperformed.
 2. A Structure Preserving Database Encryption method forencrypting a content of one or more cells in a database, wherein each ofwhich of said cells having a unique cell coordinates represented bytable, row and column identifiers in said database, comprising: a.generating a unique number for each of said cells according to thecorresponding table, row and column identifiers of each of said cells;and b. encrypting a content of each of said cells with its correspondinggenerated unique number, while a structure of tables and indexes of saiddatabase remains as before the encryption which provides a transparentdecryption process to a user; wherein the encryption of each cell valueis performed by: a. determining a value stored in a corresponding cell;b. determining the position of said cell within a database bydetermining said cell table, row and column identifiers; c. activating afunction concatenating said cell table, row and column identifiers andas a result, obtaining a number based on said identifiers; d. performinga XOR operation between said number and said value stored in said cellor concatenating said number with said value stored in said cell; and e.activating an encryption function on a result obtained from said XORoperation or from said concatenating of said number with said valuestored in said cell, wherein the content of each cell in the databasebefore the encryption includes a plaintext value, while after theencryption the content of each cell in said database includes aciphertext value.
 3. A method according to claim 2, wherein thedecryption process comprises: a. identifying a client by means of anauthentication server communicating over a conventional identificationprotocol; b. receiving one or more encryption keys from saidauthentication server by said client, wherein said one or moreencryption keys being relevant for performing at least one query fromsaid client, according to the access right definition of said client; c.generating a session by means of said client with a database server; d.transferring from said client to said database server the correspondingone or more encryption keys received from said authentication server; e.generating at least one query by said client; f. searching by means ofsaid database server an encrypted database for the corresponding datarequested in said at least one query; g. after finding saidcorresponding data, decrypting said corresponding data by means of saidone or more corresponding encryption keys; and h. transferring theresults of said at least one query from said database server to saidclient.
 4. A method according to claim 2, further comprising activatinga hash function on the generated unique number, thereby obtaining ahashed unique number.
 5. A method according to claim 2, furthercomprising activating on the encrypted cell content a decryptionfunction which decrypts the value encrypted within said cell, byperforming a XOR operation between said decrypted value and thegenerated unique number for said cell.
 6. A method according to claim 5,further comprising activating on an encrypted cell content a decryptionfunction which decrypts the value encrypted within said cell, byperforming a XOR operation between said decrypted value and the hashedunique number, or by performing discarding said hashed unique numberfrom said decrypted value.
 7. A method according to claim 2, furthercomprising allowing to define an encrypted index for each table in thedatabase which containing the encrypted cell content.
 8. A methodaccording to claim 7, wherein the encrypted index for each table in saiddatabase, comprising the steps of: a. concatenating the content of eachcell value in said table with a random number having a fixed number ofbits or the row identifier of each cell in said table; and b. activatinga nondeterministic encryption function on the result obtained from saidconcatenating, thereby generating one or more encrypted index entrieseach of which containing one or more encrypted indexed values.
 9. Amethod according to claim 8, wherein the encrypted index for each tablein said database further comprising the steps of: a. providing an entryself pointer which used as a node identifier of the corresponding index,said self pointer determines the position of the corresponding node insaid corresponding index; b. obtaining an internal pointer to eachencrypted index entry; c. obtaining an external pointer to acorresponding row in a table wherein said cell value is stored; d.encrypting said external pointer by a conventional encryption function;and e. activating a message authentication code function on the indexedvalue said three pointers, thereby calculating a message authenticationcode value.
 10. A method according to claim 9, further comprising: a.defining a fixed size pool for each index, said pool holding one or morevalues for inserting into the corresponding index; and b. updating eachof said indexes with the corresponding said one or more values, wheneversaid pool is full.
 11. A method to claim 10, further comprisingextracting corresponding values from the corresponding pool to thecorresponding index in a random order.
 12. A method according to claim8, further comprising executing a client's query in the encrypted indexdatabase, wherein said executed query is done by means of a databaseserver using sub-indexes.
 13. A method according to claim 12, whereinthe executing of a client's query in the encrypted index database,comprising the steps of: a. connecting to a database server via saidclient and identifying said client; b. creating a secure session betweensaid database server and said client; c. transferring one or moreencryption keys by means of said client to said database server; d.submitting a query by means of said client to said database server; e.locating a corresponding sub-indexes which said client is entitled toaccess; f. executing said query on said corresponding sub-indexes bymeans of said database server using said one or more encryption keys; g.obtaining a result to said query; and h. transferring said obtainedresult to said client.